Malware classification, a simple comparison:

dynamic analysis features, text vectorizers, metrics and winning models

intro

TL;DR

This notebook takes cuckoo-sandbox reports and process them into long sentences builded from system-calls ('syscalls').

It then labels them according to VirusTotal ('VT') AVs results.

Then, we build features from the sentences using Frequency of Ngrams (we're using frequency method for simplicity reasons, we coul'd have used TF-IDF just as well).

We train different models on these features and compare simple metrics and curves, and lastly announcing winning classifiers.

אמ;לק

מחברת שמעבדת דוחות קוקו, והופכת אותם למשפטים ארוכים של סיסקולים עם לייבל מתאים מוירוס-טוטאל

לאחר מכן בונים מהמשפטים פיצ'רים באמצעות תדירות הופעה של אן-גרמים (באותה מידה אפשר גם לבצע -טי-אף-איי-די-אף)

ומריצים מודלים קלאסיים ואחרים (לוגיסטיק-רגרשן / רנדום-פורסט)

לבסוף אנחנו משווים מטריקות וגרפים ומכריזים על מסווגים מנצחים

Dataset

We're using a fraction from The Berkeley Detection Platform, a malware-dataset containing over 1 million samples collected over a period of 2.5 years.

http://secml.cs.berkeley.edu/detection_platform/

##################### results ########################### ### comparing between different train-test-splits ### ## considering the first-seen-time of the malwares ## -- running on 1000 samples from miller's data-set (50% malicious, 50% benign) -- foreach sample - reading 500-2000 syscalls from *all* processes (recorded by cuckoo) sorted by *timestamp* -- got total 45 different syscalls (e.g, in XP there are 300 possible NT-calls) -- features are syscalls ngrams, with intervals: (1,1) (1,2) (2,2) ... (n=4, n=4) -- comparing lots of classifiers: LR, RF, DecisionTree, MLP(NN), KNN, L-SVM, RBF-SVM, AdaBoost, GNB, QDA ####### test-first_seen and train-first_seen are random: train: 2012-01-01 - 2014-06-01 test: 2012-01-01 - 2014-06-01 Top 3 models: (by precision metric) {'model': 'AdaBoost with (3, 3)-Grams-CountVectorizer-vectorizer', 'score': 0.914} {'model': '10 Random Forest 5-max-depth with (3, 3)-Grams-CountVectorizer-vectorizer', 'score': 0.907} {'model': 'Neural Net with (1, 4)-Grams-CountVectorizer-vectorizer', 'score': 0.900} ####### test-first_seen is right after train-first_seen: train: 2012-01-01 - 2013-06-01 test: 2013-06-01 - 2013-12-01 Top 3 models: (by precision metric) {'model': 'Neural Net with (2, 4)-Grams-CountVectorizer-vectorizer', 'score': 0.919} {'model': 'Neural Net with (2, 3)-Grams-CountVectorizer-vectorizer', 'score': 0.916} {'model': '10 Random Forest 5-max-depth with (2, 3)-Grams-TfidfVectorizer-vectorizer', 'score': 0.914} ####### test-first_seen is 4-months after train-first_seen: train: 2012-01-01 - 2013-06-01 test : 2013-10-01 - 2014-06-01 1. Top 3 models: (by precision metric) {'model': 'AdaBoost with (4, 4)-Grams-CountVectorizer-vectorizer', 'score': 0.803} {'model': '10 Random Forest 5-max-depth with (2, 4)-Grams-TfidfVectorizer-vectorizer', 'score': 0.781} {'model': 'RF with (1, 1)-Grams-CountVectorizer-vectorizer', 'score': 0.779} 2. Top 3 models: (by precision metric) {'model': '10 Random Forest 5-max-depth with (4, 4)-Grams-TfidfVectorizer-vectorizer', 'score': 0.909} {'model': 'RF with (1, 2)-Grams-CountVectorizer-vectorizer', 'score': 0.859} {'model': '10 Random Forest 5-max-depth with (1, 4)-Grams-TfidfVectorizer-vectorizer', 'score': 0.833}

project imports

In [1]:
#### project imports

# basis
import os
import json
import sys
import itertools
from pathlib import Path
import pandas
import numpy

# feature_extraction
from sklearn import model_selection
from sklearn import preprocessing
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# models
from sklearn import linear_model
from sklearn import  ensemble
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

# metrics and curves
from sklearn.metrics import roc_curve
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score
from sklearn.metrics import auc
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

# nir added for sort-by-timestamp
from datetime import datetime

setting pathes for data directory

In [2]:
# TODO: set data dir 
cuckoo_reports_dir_Path = Path("/home/john/Desktop/SCLR/IGNORE/miller700_150_ratio05/")
labels_dict_Path = Path("/home/john/Desktop/SCLR/IGNORE/miller700_150_ratio05/labels.json")
!file /home/john/Desktop/SCLR/IGNORE/miller700_150_ratio05/test/0009e4770ea519865c55e9e8dff5c9a537740761a14e8aac3c69bb7d47a92115
/home/john/Desktop/SCLR/IGNORE/miller700_150_ratio05/test/0009e4770ea519865c55e9e8dff5c9a537740761a14e8aac3c69bb7d47a92115: ASCII text, with very long lines, with no line terminators

load syscalls data

for each cuckoo report, sort syscalls (from all processes) by timestamps, and create a sentece from the first 'max_seq_len'.

In [3]:
max_seq_len = 500  # lowest limit = 500
i=0
labels = {}
processed_sequences = {}

with open(labels_dict_Path, 'rb') as labels_dict_handle:
    labels_dict = json.load(labels_dict_handle)

    
for set in ["train", "test"]:
    labels[set] = []
    processed_sequences[set] = []
    for report_file in os.listdir(cuckoo_reports_dir_Path/set):
        i=i+1
        with open(file=cuckoo_reports_dir_Path/set/report_file, mode='r', encoding="latin-1") as report_file_handle: # deleted 'rb'
            try:
                report_data = json.load(report_file_handle)

                # extract sorted calls by time:
                calls_times_dict = {}
                procs_list = report_data['behavior']['processes']
                extracted_procs_dict_list = []
                for proc_dict in procs_list:
                    calls_list = proc_dict["calls"]
                    extracted_proc_dict_list = []
                    for call_dict in itertools.islice(calls_list, 0, max_seq_len):
                        timestamp_str = call_dict['timestamp']
                        date_time_obj = datetime.strptime(timestamp_str, '%Y%m%d%H%M%S.%f')
                        syscall_str = call_dict['api']
                        extracted_calls_times_dict = {}
                        extracted_calls_times_dict['date_time_obj'] = date_time_obj
                        extracted_calls_times_dict['syscall_str'] = syscall_str
                        extracted_proc_dict_list.append(extracted_calls_times_dict)
                    extracted_procs_dict_list.extend(extracted_proc_dict_list)
                extracted_procs_dict_list = sorted(extracted_procs_dict_list,
                                                       key=lambda call: call['date_time_obj'],
                                                       reverse=False)[0:max_seq_len]
                seq_sentence = ' '.join(d['syscall_str'] for d in extracted_procs_dict_list)
                processed_sequences[set].append(seq_sentence)

                # extract label for this sample (only if no excep occured):
                sha256 = report_data['sha256']
                score = numpy.float64(labels_dict[sha256]['score'])
                syscalls_label = 1 if score > 5 else 0
                labels[set].append(syscalls_label)

            except json.JSONDecodeError as err:
                print("process report {0}   JSONDecodeError error: {1}".format(report_file, err))
                pass
            except:
                print("process_reports_file: {0}, error: {1}".format(report_file, sys.exc_info()))
                pass
process_reports_file: dc4c50088b5212c6cd188ea2c49817c8cde7c6068b4157f47abf03d2e7349907, error: (<class 'ValueError'>, ValueError('time data \'1524"\' does not match format \'%Y%m%d%H%M%S.%f\''), <traceback object at 0x7fca35058a00>)
In [5]:
processed_sequences["train"][0][0:100]
Out[5]:
'DeviceIoControl LoadLibraryA LoadLibraryA RegOpenKeyExW RegOpenKeyExW RegQueryValueExW RegOpenKeyExW'
In [4]:
len(processed_sequences["train"])
Out[4]:
699
In [19]:
len(processed_sequences["test"])
Out[19]:
148

convert data to pandas.DataFrame

In [7]:
trainDF = pandas.DataFrame()
trainDF['text'] = processed_sequences["train"]
trainDF['label'] = labels["train"]
train_x, train_y = trainDF['text'], trainDF['label']

testDF = pandas.DataFrame()
testDF['text'] = processed_sequences["test"]
testDF['label'] = labels["test"]
test_x, test_y = testDF['text'], testDF['label']

counting #differents syscalls in the data

In [8]:
count_vect = CountVectorizer(analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1,1))
cv_fit = count_vect.fit_transform(train_x)
print(count_vect.get_feature_names()[0:10])
print(len(count_vect.get_feature_names()))
['controlservice', 'copyfileexw', 'createfilew', 'createmutexw', 'createprocessinternalw', 'createremotethread', 'createservicea', 'deletefilew', 'deleteservice', 'deviceiocontrol']
42
In [9]:
itidf_vect = TfidfVectorizer(analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1,1))
itidf_fit = itidf_vect.fit_transform(train_x)
print(itidf_vect.get_feature_names()[0:10])
print(len(itidf_vect.get_feature_names()))
['controlservice', 'copyfileexw', 'createfilew', 'createmutexw', 'createprocessinternalw', 'createremotethread', 'createservicea', 'deletefilew', 'deleteservice', 'deviceiocontrol']
42

build model and features with sklearn

declare func: Train model, calc metrics and build ROC and P@R curves on Test set

In [10]:
def train_test_model(model, model_name, trainX, trainy, testX, testy, is_neural_net=False):
    
    # fit the training dataset on the classifier
    model.fit(trainX, trainy)
    # predict probabilities
    lr_probs = model.predict_proba(testX)
    # keep probabilities for the positive outcome only
    lr_probs = lr_probs[:, 1]
    # predict class values
    yhat = model.predict(testX)
    lr_precision, lr_recall, _ = precision_recall_curve(testy, lr_probs)
    lr_PR_f1, lr_PR_auc = f1_score(testy, yhat), auc(lr_recall, lr_precision)
    
    # nir added metricsPR-
    acc = accuracy_score(testy, yhat)
    precision, recall, ROC_auc = precision_score(testy, yhat), recall_score(testy, yhat), roc_auc_score(testy, lr_probs)
    # summarize scores
    print('\t Model: precision=%.3f recall=%.3f ROC-auc=%.3f accuracy=%.3f P@R-f1=%.3f P@R-auc=%.3f' % (precision, recall, ROC_auc, acc, lr_PR_f1, lr_PR_auc))
    
    # create 2 plots:
    _, axes = plt.subplots(1, 2, figsize=(20, 5))
    # plot the precision-recall curves
    no_skill = len(testy[testy==1]) / len(testy)
    axes[0].plot([0, 1], [no_skill, no_skill], linestyle='--', label='No Skill')
    axes[0].plot(lr_recall, lr_precision, marker='.', label='Model')
    # set title
    axes[0].set_title('Precision@Recall-Curve ('+model_name+')')
    # axis labels
    axes[0].set_xlabel('Recall')
    axes[0].set_ylabel('Precision')
    # show the legend
    axes[0].legend()    
    # generate a no skill prediction (majority class)
    ns_probs = [0 for _ in range(len(testy))]
    # calculate scores
    ns_auc = roc_auc_score(testy, ns_probs)
    lr_auc = roc_auc_score(testy, lr_probs)
    # summarize scores
    print('\t No Skill: ROC AUC=%.3f' % (ns_auc))
    print('\t Model: ROC AUC=%.3f' % (lr_auc))
    # calculate roc curves
    ns_fpr, ns_tpr, _ = roc_curve(testy, ns_probs)
    lr_fpr, lr_tpr, _ = roc_curve(testy, lr_probs)
    # plot the roc curve for the model
    axes[1].plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
    axes[1].plot(lr_fpr, lr_tpr, marker='.', label='Model')
    # set title
    axes[1].set_title('ROC-curve ('+model_name+')')
    # axis labels
    axes[1].set_xlabel('False Positive Rate')
    axes[1].set_ylabel('True Positive Rate')
    # show the legend
    axes[1].legend()

    return precision 

comparing the calssifiers:

Scikit-learn built-in models with Ngram from the syscalls sequences

In [11]:
# suppressing warnings for pretty prints
import warnings
warnings.filterwarnings("ignore")
In [12]:
n = 4
ngrams_dict_list = []
for i in range(1, n+1):
    for j in range(1, i+1):
        ngram = (j,i)
        ngram_name = str(ngram)+'-Grams'
        ngrams_dict_list.append((ngram, ngram_name))
In [13]:
# there are some commented models 
# that doesn't accept "new-syscalls" 
# which didn't appeared in the train-set
models_and_names_list = [(linear_model.LogisticRegression(), "LR"), \
                         (ensemble.RandomForestClassifier(), "RF"), \
                         (KNeighborsClassifier(3), "3 Nearest Neighbors"), \
                         (DecisionTreeClassifier(max_depth=5), "Decision Tree"), \
                         (RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1), "10 Random Forest 5-max-depth"), \
                         (MLPClassifier(alpha=1, max_iter=1000), "Neural Net"), \
                         (AdaBoostClassifier(), "AdaBoost"), \
                         #(SVC(kernel="linear", C=0.025), "Linear SVM"), \
                         #(SVC(gamma=2, C=1), "RBF SVM"), \
                         #(GaussianProcessClassifier(1.0 * RBF(1.0)), "Gaussian Process"), \
                         #(GaussianNB(), "Gaussian Naive Bayes"), \
                         #(QuadraticDiscriminantAnalysis(), "QDA")
                        ]
In [14]:
vectorizers_and_names_list = [(CountVectorizer, "CountVectorizer"), \
                              (TfidfVectorizer, "TfidfVectorizer")]
In [15]:
max_score = -1
winning_model = "none"
scoreboard_dict_list = []
for ngram, ngram_name in ngrams_dict_list:
    for vectorizer, vectorizer_name in vectorizers_and_names_list:
        # create a count/tfidf vectorizer object 
        count_vect = vectorizer(analyzer='word', token_pattern=r'\w{1,}', ngram_range=ngram)
        count_vect.fit(train_x)
        
        # transform the training and test data using count vectorizer object
        xtrain_count =  count_vect.transform(train_x)
        xtest_count =  count_vect.transform(test_x)

        # build model
        for model_func, model_name in models_and_names_list:
            description = model_name + " with "  + ngram_name + "-" + vectorizer_name + "-vectorizer"
            print("model: " + description)
            print("\n")
            # run curves functionFleece
            score = train_test_model(model_func, description, xtrain_count, train_y, xtest_count, test_y)
            if score > max_score:
                max_score = score
                winning_model = description
            score_dict = {}
            score_dict['model'] = description
            score_dict['score'] = score
            scoreboard_dict_list.append(score_dict)
        
print("############################")
print("and the winner is........")
print(winning_model)
print("!!!!!!!!!!!!")
print("############################")
model: LR with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.648 recall=0.797 ROC-auc=0.769 accuracy=0.682 P@R-f1=0.715 P@R-auc=0.795
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.769
model: RF with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.688 recall=0.716 ROC-auc=0.784 accuracy=0.696 P@R-f1=0.702 P@R-auc=0.797
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.784
model: 3 Nearest Neighbors with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.556 recall=0.676 ROC-auc=0.622 accuracy=0.568 P@R-f1=0.610 P@R-auc=0.689
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.622
model: Decision Tree with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.688 recall=0.716 ROC-auc=0.685 accuracy=0.696 P@R-f1=0.702 P@R-auc=0.643
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.685
model: 10 Random Forest 5-max-depth with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.662 recall=0.608 ROC-auc=0.743 accuracy=0.649 P@R-f1=0.634 P@R-auc=0.800
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.743
model: Neural Net with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.623 recall=0.649 ROC-auc=0.684 accuracy=0.628 P@R-f1=0.636 P@R-auc=0.689
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.684
model: AdaBoost with (1, 1)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.773 recall=0.784 ROC-auc=0.836 accuracy=0.777 P@R-f1=0.779 P@R-auc=0.853
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.836
model: LR with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.649 recall=0.649 ROC-auc=0.665 accuracy=0.649 P@R-f1=0.649 P@R-auc=0.727
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.665
model: RF with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.716 recall=0.716 ROC-auc=0.779 accuracy=0.716 P@R-f1=0.716 P@R-auc=0.776
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.779
model: 3 Nearest Neighbors with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.574 recall=0.784 ROC-auc=0.679 accuracy=0.601 P@R-f1=0.663 P@R-auc=0.734
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.679
model: Decision Tree with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.690 recall=0.811 ROC-auc=0.736 accuracy=0.723 P@R-f1=0.745 P@R-auc=0.685
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.736
model: 10 Random Forest 5-max-depth with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.819 recall=0.797 ROC-auc=0.867 accuracy=0.811 P@R-f1=0.808 P@R-auc=0.880
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.867
model: Neural Net with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.580 recall=0.689 ROC-auc=0.649 accuracy=0.595 P@R-f1=0.630 P@R-auc=0.710
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.649
model: AdaBoost with (1, 1)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.645 recall=0.811 ROC-auc=0.799 accuracy=0.682 P@R-f1=0.719 P@R-auc=0.817
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.799
model: LR with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.674 recall=0.838 ROC-auc=0.800 accuracy=0.716 P@R-f1=0.747 P@R-auc=0.815
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.800
model: RF with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.859 recall=0.743 ROC-auc=0.846 accuracy=0.811 P@R-f1=0.797 P@R-auc=0.863
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.846
model: 3 Nearest Neighbors with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.559 recall=0.703 ROC-auc=0.645 accuracy=0.574 P@R-f1=0.623 P@R-auc=0.706
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.645
model: Decision Tree with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.700 recall=0.757 ROC-auc=0.733 accuracy=0.716 P@R-f1=0.727 P@R-auc=0.707
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.733
model: 10 Random Forest 5-max-depth with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.761 recall=0.689 ROC-auc=0.730 accuracy=0.736 P@R-f1=0.723 P@R-auc=0.750
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.730
model: Neural Net with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.656 recall=0.824 ROC-auc=0.757 accuracy=0.696 P@R-f1=0.731 P@R-auc=0.784
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.757
model: AdaBoost with (1, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.711 recall=0.730 ROC-auc=0.779 accuracy=0.716 P@R-f1=0.720 P@R-auc=0.806
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.779
model: LR with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.681 recall=0.662 ROC-auc=0.719 accuracy=0.676 P@R-f1=0.671 P@R-auc=0.747
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.719
model: RF with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.696 recall=0.649 ROC-auc=0.769 accuracy=0.682 P@R-f1=0.671 P@R-auc=0.796
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.769
model: 3 Nearest Neighbors with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.573 recall=0.743 ROC-auc=0.663 accuracy=0.595 P@R-f1=0.647 P@R-auc=0.711
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.663
model: Decision Tree with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.682 recall=0.784 ROC-auc=0.741 accuracy=0.709 P@R-f1=0.730 P@R-auc=0.727
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.741
model: 10 Random Forest 5-max-depth with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.558 recall=0.649 ROC-auc=0.608 accuracy=0.568 P@R-f1=0.600 P@R-auc=0.607
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.608
model: Neural Net with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.650 recall=0.703 ROC-auc=0.722 accuracy=0.662 P@R-f1=0.675 P@R-auc=0.733
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.722
model: AdaBoost with (1, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.640 recall=0.770 ROC-auc=0.749 accuracy=0.669 P@R-f1=0.699 P@R-auc=0.787
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.749
model: LR with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.674 recall=0.838 ROC-auc=0.800 accuracy=0.716 P@R-f1=0.747 P@R-auc=0.817
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.800
model: RF with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.820 recall=0.676 ROC-auc=0.780 accuracy=0.764 P@R-f1=0.741 P@R-auc=0.806
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.780
model: 3 Nearest Neighbors with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.568 recall=0.730 ROC-auc=0.642 accuracy=0.588 P@R-f1=0.639 P@R-auc=0.687
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.642
model: Decision Tree with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.598 recall=0.743 ROC-auc=0.571 accuracy=0.622 P@R-f1=0.663 P@R-auc=0.611
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.571
model: 10 Random Forest 5-max-depth with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.660 recall=0.419 ROC-auc=0.696 accuracy=0.601 P@R-f1=0.512 P@R-auc=0.667
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.696
model: Neural Net with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.622 recall=0.757 ROC-auc=0.750 accuracy=0.649 P@R-f1=0.683 P@R-auc=0.759
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.750
model: AdaBoost with (2, 2)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.667 recall=0.703 ROC-auc=0.772 accuracy=0.676 P@R-f1=0.684 P@R-auc=0.802
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.772
model: LR with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.738 recall=0.649 ROC-auc=0.745 accuracy=0.709 P@R-f1=0.691 P@R-auc=0.739
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.745
model: RF with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.712 recall=0.703 ROC-auc=0.773 accuracy=0.709 P@R-f1=0.707 P@R-auc=0.798
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.773
model: 3 Nearest Neighbors with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.600 recall=0.730 ROC-auc=0.675 accuracy=0.622 P@R-f1=0.659 P@R-auc=0.712
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.675
model: Decision Tree with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.768 recall=0.716 ROC-auc=0.767 accuracy=0.750 P@R-f1=0.741 P@R-auc=0.763
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.767
model: 10 Random Forest 5-max-depth with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.639 recall=0.932 ROC-auc=0.738 accuracy=0.703 P@R-f1=0.758 P@R-auc=0.709
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.738
model: Neural Net with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.667 recall=0.676 ROC-auc=0.761 accuracy=0.669 P@R-f1=0.671 P@R-auc=0.743
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.761
model: AdaBoost with (2, 2)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.667 recall=0.676 ROC-auc=0.761 accuracy=0.669 P@R-f1=0.671 P@R-auc=0.779
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.761
model: LR with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.690 recall=0.784 ROC-auc=0.772 accuracy=0.716 P@R-f1=0.734 P@R-auc=0.773
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.772
model: RF with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.742 recall=0.662 ROC-auc=0.813 accuracy=0.716 P@R-f1=0.700 P@R-auc=0.799
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.813
model: 3 Nearest Neighbors with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.548 recall=0.689 ROC-auc=0.638 accuracy=0.561 P@R-f1=0.611 P@R-auc=0.699
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.638
model: Decision Tree with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.724 recall=0.851 ROC-auc=0.765 accuracy=0.764 P@R-f1=0.783 P@R-auc=0.779
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.765
model: 10 Random Forest 5-max-depth with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.760 recall=0.770 ROC-auc=0.789 accuracy=0.764 P@R-f1=0.765 P@R-auc=0.757
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.789
model: Neural Net with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.639 recall=0.838 ROC-auc=0.742 accuracy=0.682 P@R-f1=0.725 P@R-auc=0.757
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.742
model: AdaBoost with (1, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.688 recall=0.716 ROC-auc=0.792 accuracy=0.696 P@R-f1=0.702 P@R-auc=0.814
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.792
model: LR with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.742 recall=0.662 ROC-auc=0.742 accuracy=0.716 P@R-f1=0.700 P@R-auc=0.731
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.742
model: RF with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.671 recall=0.716 ROC-auc=0.754 accuracy=0.682 P@R-f1=0.693 P@R-auc=0.773
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.754
model: 3 Nearest Neighbors with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.583 recall=0.757 ROC-auc=0.640 accuracy=0.608 P@R-f1=0.659 P@R-auc=0.685
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.640
model: Decision Tree with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.731 recall=0.662 ROC-auc=0.686 accuracy=0.709 P@R-f1=0.695 P@R-auc=0.722
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.686
model: 10 Random Forest 5-max-depth with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.647 recall=0.595 ROC-auc=0.693 accuracy=0.635 P@R-f1=0.620 P@R-auc=0.653
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.693
model: Neural Net with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.679 recall=0.716 ROC-auc=0.764 accuracy=0.689 P@R-f1=0.697 P@R-auc=0.724
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.764
model: AdaBoost with (1, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.723 recall=0.811 ROC-auc=0.812 accuracy=0.750 P@R-f1=0.764 P@R-auc=0.812
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.812
model: LR with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.718 recall=0.757 ROC-auc=0.774 accuracy=0.730 P@R-f1=0.737 P@R-auc=0.775
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.774
model: RF with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.735 recall=0.676 ROC-auc=0.771 accuracy=0.716 P@R-f1=0.704 P@R-auc=0.787
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.771
model: 3 Nearest Neighbors with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.539 recall=0.649 ROC-auc=0.615 accuracy=0.547 P@R-f1=0.589 P@R-auc=0.664
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.615
model: Decision Tree with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.661 recall=0.527 ROC-auc=0.625 accuracy=0.628 P@R-f1=0.586 P@R-auc=0.638
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.625
model: 10 Random Forest 5-max-depth with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.633 recall=0.514 ROC-auc=0.648 accuracy=0.608 P@R-f1=0.567 P@R-auc=0.676
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.648
model: Neural Net with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.635 recall=0.824 ROC-auc=0.748 accuracy=0.676 P@R-f1=0.718 P@R-auc=0.749
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.748
model: AdaBoost with (2, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.699 recall=0.689 ROC-auc=0.775 accuracy=0.696 P@R-f1=0.694 P@R-auc=0.763
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.775
model: LR with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.758 recall=0.676 ROC-auc=0.755 accuracy=0.730 P@R-f1=0.714 P@R-auc=0.721
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.755
model: RF with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.703 recall=0.703 ROC-auc=0.756 accuracy=0.703 P@R-f1=0.703 P@R-auc=0.788
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.756
model: 3 Nearest Neighbors with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.556 recall=0.608 ROC-auc=0.628 accuracy=0.561 P@R-f1=0.581 P@R-auc=0.685
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.628
model: Decision Tree with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.595 recall=0.595 ROC-auc=0.632 accuracy=0.595 P@R-f1=0.595 P@R-auc=0.689
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.632
model: 10 Random Forest 5-max-depth with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.560 recall=0.189 ROC-auc=0.670 accuracy=0.520 P@R-f1=0.283 P@R-auc=0.630
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.670
model: Neural Net with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.671 recall=0.743 ROC-auc=0.762 accuracy=0.689 P@R-f1=0.705 P@R-auc=0.719
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.762
model: AdaBoost with (2, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.693 recall=0.703 ROC-auc=0.790 accuracy=0.696 P@R-f1=0.698 P@R-auc=0.781
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.790
model: LR with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.722 recall=0.770 ROC-auc=0.771 accuracy=0.736 P@R-f1=0.745 P@R-auc=0.761
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.771
model: RF with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.750 recall=0.689 ROC-auc=0.794 accuracy=0.730 P@R-f1=0.718 P@R-auc=0.818
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.794
model: 3 Nearest Neighbors with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.554 recall=0.689 ROC-auc=0.616 accuracy=0.568 P@R-f1=0.614 P@R-auc=0.663
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.616
model: Decision Tree with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.689 recall=0.568 ROC-auc=0.682 accuracy=0.655 P@R-f1=0.622 P@R-auc=0.691
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.682
model: 10 Random Forest 5-max-depth with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.551 recall=0.730 ROC-auc=0.621 accuracy=0.568 P@R-f1=0.628 P@R-auc=0.647
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.621
model: Neural Net with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.678 recall=0.797 ROC-auc=0.750 accuracy=0.709 P@R-f1=0.733 P@R-auc=0.727
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.750
model: AdaBoost with (3, 3)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.726 recall=0.716 ROC-auc=0.802 accuracy=0.723 P@R-f1=0.721 P@R-auc=0.830
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.802
model: LR with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.754 recall=0.662 ROC-auc=0.766 accuracy=0.723 P@R-f1=0.705 P@R-auc=0.720
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.766
model: RF with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.688 recall=0.595 ROC-auc=0.781 accuracy=0.662 P@R-f1=0.638 P@R-auc=0.775
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.781
model: 3 Nearest Neighbors with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.570 recall=0.608 ROC-auc=0.641 accuracy=0.574 P@R-f1=0.588 P@R-auc=0.702
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.641
model: Decision Tree with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.706 recall=0.649 ROC-auc=0.716 accuracy=0.689 P@R-f1=0.676 P@R-auc=0.709
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.716
model: 10 Random Forest 5-max-depth with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.514 recall=0.243 ROC-auc=0.611 accuracy=0.507 P@R-f1=0.330 P@R-auc=0.562
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.611
model: Neural Net with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.684 recall=0.730 ROC-auc=0.785 accuracy=0.696 P@R-f1=0.706 P@R-auc=0.729
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.785
model: AdaBoost with (3, 3)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.726 recall=0.716 ROC-auc=0.775 accuracy=0.723 P@R-f1=0.721 P@R-auc=0.776
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.775
model: LR with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.691 recall=0.757 ROC-auc=0.767 accuracy=0.709 P@R-f1=0.723 P@R-auc=0.775
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.767
model: RF with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.729 recall=0.689 ROC-auc=0.780 accuracy=0.716 P@R-f1=0.708 P@R-auc=0.808
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.780
model: 3 Nearest Neighbors with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.549 recall=0.676 ROC-auc=0.620 accuracy=0.561 P@R-f1=0.606 P@R-auc=0.668
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.620
model: Decision Tree with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.612 recall=0.811 ROC-auc=0.700 accuracy=0.649 P@R-f1=0.698 P@R-auc=0.749
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.700
model: 10 Random Forest 5-max-depth with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.507 recall=0.919 ROC-auc=0.637 accuracy=0.514 P@R-f1=0.654 P@R-auc=0.724
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.637
model: Neural Net with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.632 recall=0.743 ROC-auc=0.742 accuracy=0.655 P@R-f1=0.683 P@R-auc=0.763
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.742
model: AdaBoost with (1, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.709 recall=0.757 ROC-auc=0.798 accuracy=0.723 P@R-f1=0.732 P@R-auc=0.814
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.798
model: LR with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.729 recall=0.689 ROC-auc=0.749 accuracy=0.716 P@R-f1=0.708 P@R-auc=0.719
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.749
model: RF with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.732 recall=0.703 ROC-auc=0.779 accuracy=0.723 P@R-f1=0.717 P@R-auc=0.823
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.779
model: 3 Nearest Neighbors with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.576 recall=0.716 ROC-auc=0.641 accuracy=0.595 P@R-f1=0.639 P@R-auc=0.692
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.641
model: Decision Tree with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.600 recall=0.730 ROC-auc=0.583 accuracy=0.622 P@R-f1=0.659 P@R-auc=0.569
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.583
model: 10 Random Forest 5-max-depth with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.833 recall=0.270 ROC-auc=0.690 accuracy=0.608 P@R-f1=0.408 P@R-auc=0.745
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.690
model: Neural Net with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.690 recall=0.784 ROC-auc=0.785 accuracy=0.716 P@R-f1=0.734 P@R-auc=0.726
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.785
model: AdaBoost with (1, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.739 recall=0.689 ROC-auc=0.803 accuracy=0.723 P@R-f1=0.713 P@R-auc=0.787
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.803
model: LR with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.705 recall=0.743 ROC-auc=0.755 accuracy=0.716 P@R-f1=0.724 P@R-auc=0.768
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.755
model: RF with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.742 recall=0.662 ROC-auc=0.791 accuracy=0.716 P@R-f1=0.700 P@R-auc=0.807
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.791
model: 3 Nearest Neighbors with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.551 recall=0.662 ROC-auc=0.621 accuracy=0.561 P@R-f1=0.601 P@R-auc=0.671
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.621
model: Decision Tree with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.677 recall=0.595 ROC-auc=0.698 accuracy=0.655 P@R-f1=0.633 P@R-auc=0.724
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.698
model: 10 Random Forest 5-max-depth with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.571 recall=0.432 ROC-auc=0.601 accuracy=0.554 P@R-f1=0.492 P@R-auc=0.596
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.601
model: Neural Net with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.659 recall=0.811 ROC-auc=0.770 accuracy=0.696 P@R-f1=0.727 P@R-auc=0.744
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.770
model: AdaBoost with (2, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.740 recall=0.730 ROC-auc=0.827 accuracy=0.736 P@R-f1=0.735 P@R-auc=0.810
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.827
model: LR with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.758 recall=0.676 ROC-auc=0.762 accuracy=0.730 P@R-f1=0.714 P@R-auc=0.716
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.762
model: RF with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.706 recall=0.649 ROC-auc=0.767 accuracy=0.689 P@R-f1=0.676 P@R-auc=0.764
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.767
model: 3 Nearest Neighbors with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.565 recall=0.649 ROC-auc=0.641 accuracy=0.574 P@R-f1=0.604 P@R-auc=0.695
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.641
model: Decision Tree with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.621 recall=0.554 ROC-auc=0.668 accuracy=0.608 P@R-f1=0.586 P@R-auc=0.684
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.668
model: 10 Random Forest 5-max-depth with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.419 recall=0.243 ROC-auc=0.555 accuracy=0.453 P@R-f1=0.308 P@R-auc=0.512
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.555
model: Neural Net with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.691 recall=0.757 ROC-auc=0.792 accuracy=0.709 P@R-f1=0.723 P@R-auc=0.729
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.792
model: AdaBoost with (2, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.714 recall=0.676 ROC-auc=0.810 accuracy=0.703 P@R-f1=0.694 P@R-auc=0.792
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.810
model: LR with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.724 recall=0.743 ROC-auc=0.756 accuracy=0.730 P@R-f1=0.733 P@R-auc=0.755
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.756
model: RF with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.662 recall=0.581 ROC-auc=0.735 accuracy=0.642 P@R-f1=0.619 P@R-auc=0.752
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.735
model: 3 Nearest Neighbors with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.548 recall=0.689 ROC-auc=0.611 accuracy=0.561 P@R-f1=0.611 P@R-auc=0.666
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.611
model: Decision Tree with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.556 recall=0.608 ROC-auc=0.600 accuracy=0.561 P@R-f1=0.581 P@R-auc=0.664
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.600
model: 10 Random Forest 5-max-depth with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.569 recall=0.392 ROC-auc=0.593 accuracy=0.547 P@R-f1=0.464 P@R-auc=0.632
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.593
model: Neural Net with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.655 recall=0.770 ROC-auc=0.751 accuracy=0.682 P@R-f1=0.708 P@R-auc=0.745
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.751
model: AdaBoost with (3, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.709 recall=0.757 ROC-auc=0.811 accuracy=0.723 P@R-f1=0.732 P@R-auc=0.809
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.811
model: LR with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.746 recall=0.635 ROC-auc=0.772 accuracy=0.709 P@R-f1=0.686 P@R-auc=0.723
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.772
model: RF with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.761 recall=0.689 ROC-auc=0.816 accuracy=0.736 P@R-f1=0.723 P@R-auc=0.833
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.816
model: 3 Nearest Neighbors with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.588 recall=0.635 ROC-auc=0.656 accuracy=0.595 P@R-f1=0.610 P@R-auc=0.705
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.656
model: Decision Tree with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.707 recall=0.554 ROC-auc=0.719 accuracy=0.662 P@R-f1=0.621 P@R-auc=0.734
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.719
model: 10 Random Forest 5-max-depth with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.808 recall=0.284 ROC-auc=0.654 accuracy=0.608 P@R-f1=0.420 P@R-auc=0.698
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.654
model: Neural Net with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.688 recall=0.716 ROC-auc=0.770 accuracy=0.696 P@R-f1=0.702 P@R-auc=0.707
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.770
model: AdaBoost with (3, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.684 recall=0.730 ROC-auc=0.747 accuracy=0.696 P@R-f1=0.706 P@R-auc=0.747
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.747
model: LR with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.720 recall=0.730 ROC-auc=0.753 accuracy=0.723 P@R-f1=0.725 P@R-auc=0.728
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.753
model: RF with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.708 recall=0.622 ROC-auc=0.767 accuracy=0.682 P@R-f1=0.662 P@R-auc=0.808
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.767
model: 3 Nearest Neighbors with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.545 recall=0.649 ROC-auc=0.611 accuracy=0.554 P@R-f1=0.593 P@R-auc=0.670
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.611
model: Decision Tree with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.727 recall=0.541 ROC-auc=0.714 accuracy=0.669 P@R-f1=0.620 P@R-auc=0.741
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.714
model: 10 Random Forest 5-max-depth with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.500 recall=0.419 ROC-auc=0.522 accuracy=0.500 P@R-f1=0.456 P@R-auc=0.544
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.522
model: Neural Net with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.646 recall=0.689 ROC-auc=0.720 accuracy=0.655 P@R-f1=0.667 P@R-auc=0.702
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.720
model: AdaBoost with (4, 4)-Grams-CountVectorizer-vectorizer


	 Model: precision=0.803 recall=0.716 ROC-auc=0.821 accuracy=0.770 P@R-f1=0.757 P@R-auc=0.842
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.821
model: LR with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.710 recall=0.662 ROC-auc=0.760 accuracy=0.696 P@R-f1=0.685 P@R-auc=0.707
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.760
model: RF with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.761 recall=0.689 ROC-auc=0.780 accuracy=0.736 P@R-f1=0.723 P@R-auc=0.802
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.780
model: 3 Nearest Neighbors with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.608 recall=0.649 ROC-auc=0.661 accuracy=0.615 P@R-f1=0.627 P@R-auc=0.698
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.661
model: Decision Tree with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.741 recall=0.541 ROC-auc=0.708 accuracy=0.676 P@R-f1=0.625 P@R-auc=0.736
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.708
model: 10 Random Forest 5-max-depth with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.909 recall=0.135 ROC-auc=0.690 accuracy=0.561 P@R-f1=0.235 P@R-auc=0.748
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.690
model: Neural Net with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.629 recall=0.757 ROC-auc=0.740 accuracy=0.655 P@R-f1=0.687 P@R-auc=0.684
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.740
model: AdaBoost with (4, 4)-Grams-TfidfVectorizer-vectorizer


	 Model: precision=0.603 recall=0.635 ROC-auc=0.721 accuracy=0.608 P@R-f1=0.618 P@R-auc=0.741
	 No Skill: ROC AUC=0.500
	 Model: ROC AUC=0.721
############################
and the winner is........
10 Random Forest 5-max-depth with (4, 4)-Grams-TfidfVectorizer-vectorizer
!!!!!!!!!!!!
############################
In [ ]:
 
In [16]:
scoreboard_dict_list
Out[16]:
[{'model': 'LR with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.6483516483516484},
 {'model': 'RF with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.6883116883116883},
 {'model': '3 Nearest Neighbors with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.5555555555555556},
 {'model': 'Decision Tree with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.6883116883116883},
 {'model': '10 Random Forest 5-max-depth with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.6617647058823529},
 {'model': 'Neural Net with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.6233766233766234},
 {'model': 'AdaBoost with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.7733333333333333},
 {'model': 'LR with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6486486486486487},
 {'model': 'RF with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7162162162162162},
 {'model': '3 Nearest Neighbors with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5742574257425742},
 {'model': 'Decision Tree with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6896551724137931},
 {'model': '10 Random Forest 5-max-depth with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8194444444444444},
 {'model': 'Neural Net with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5795454545454546},
 {'model': 'AdaBoost with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6451612903225806},
 {'model': 'LR with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6739130434782609},
 {'model': 'RF with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.859375},
 {'model': '3 Nearest Neighbors with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.5591397849462365},
 {'model': 'Decision Tree with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.7},
 {'model': '10 Random Forest 5-max-depth with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.7611940298507462},
 {'model': 'Neural Net with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6559139784946236},
 {'model': 'AdaBoost with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.7105263157894737},
 {'model': 'LR with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6805555555555556},
 {'model': 'RF with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6956521739130435},
 {'model': '3 Nearest Neighbors with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5729166666666666},
 {'model': 'Decision Tree with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6823529411764706},
 {'model': '10 Random Forest 5-max-depth with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5581395348837209},
 {'model': 'Neural Net with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.65},
 {'model': 'AdaBoost with (1, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6404494382022472},
 {'model': 'LR with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6739130434782609},
 {'model': 'RF with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.819672131147541},
 {'model': '3 Nearest Neighbors with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.5684210526315789},
 {'model': 'Decision Tree with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.5978260869565217},
 {'model': '10 Random Forest 5-max-depth with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6595744680851063},
 {'model': 'Neural Net with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6222222222222222},
 {'model': 'AdaBoost with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.6666666666666666},
 {'model': 'LR with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7384615384615385},
 {'model': 'RF with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7123287671232876},
 {'model': '3 Nearest Neighbors with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6},
 {'model': 'Decision Tree with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7681159420289855},
 {'model': '10 Random Forest 5-max-depth with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6388888888888888},
 {'model': 'Neural Net with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6666666666666666},
 {'model': 'AdaBoost with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6666666666666666},
 {'model': 'LR with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6904761904761905},
 {'model': 'RF with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.7424242424242424},
 {'model': '3 Nearest Neighbors with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.5483870967741935},
 {'model': 'Decision Tree with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.7241379310344828},
 {'model': '10 Random Forest 5-max-depth with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.76},
 {'model': 'Neural Net with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6391752577319587},
 {'model': 'AdaBoost with (1, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6883116883116883},
 {'model': 'LR with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7424242424242424},
 {'model': 'RF with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6708860759493671},
 {'model': '3 Nearest Neighbors with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5833333333333334},
 {'model': 'Decision Tree with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7313432835820896},
 {'model': '10 Random Forest 5-max-depth with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6470588235294118},
 {'model': 'Neural Net with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6794871794871795},
 {'model': 'AdaBoost with (1, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7228915662650602},
 {'model': 'LR with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.717948717948718},
 {'model': 'RF with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.7352941176470589},
 {'model': '3 Nearest Neighbors with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.5393258426966292},
 {'model': 'Decision Tree with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6610169491525424},
 {'model': '10 Random Forest 5-max-depth with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6333333333333333},
 {'model': 'Neural Net with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6354166666666666},
 {'model': 'AdaBoost with (2, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6986301369863014},
 {'model': 'LR with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7575757575757576},
 {'model': 'RF with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7027027027027027},
 {'model': '3 Nearest Neighbors with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5555555555555556},
 {'model': 'Decision Tree with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5945945945945946},
 {'model': '10 Random Forest 5-max-depth with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.56},
 {'model': 'Neural Net with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6707317073170732},
 {'model': 'AdaBoost with (2, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6933333333333334},
 {'model': 'LR with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.7215189873417721},
 {'model': 'RF with (3, 3)-Grams-CountVectorizer-vectorizer', 'score': 0.75},
 {'model': '3 Nearest Neighbors with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.5543478260869565},
 {'model': 'Decision Tree with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6885245901639344},
 {'model': '10 Random Forest 5-max-depth with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.5510204081632653},
 {'model': 'Neural Net with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.6781609195402298},
 {'model': 'AdaBoost with (3, 3)-Grams-CountVectorizer-vectorizer',
  'score': 0.726027397260274},
 {'model': 'LR with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7538461538461538},
 {'model': 'RF with (3, 3)-Grams-TfidfVectorizer-vectorizer', 'score': 0.6875},
 {'model': '3 Nearest Neighbors with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.569620253164557},
 {'model': 'Decision Tree with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7058823529411765},
 {'model': '10 Random Forest 5-max-depth with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5142857142857142},
 {'model': 'Neural Net with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6835443037974683},
 {'model': 'AdaBoost with (3, 3)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.726027397260274},
 {'model': 'LR with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.691358024691358},
 {'model': 'RF with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7285714285714285},
 {'model': '3 Nearest Neighbors with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5494505494505495},
 {'model': 'Decision Tree with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.6122448979591837},
 {'model': '10 Random Forest 5-max-depth with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5074626865671642},
 {'model': 'Neural Net with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.632183908045977},
 {'model': 'AdaBoost with (1, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7088607594936709},
 {'model': 'LR with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7285714285714285},
 {'model': 'RF with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7323943661971831},
 {'model': '3 Nearest Neighbors with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5760869565217391},
 {'model': 'Decision Tree with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6},
 {'model': '10 Random Forest 5-max-depth with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8333333333333334},
 {'model': 'Neural Net with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6904761904761905},
 {'model': 'AdaBoost with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7391304347826086},
 {'model': 'LR with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7051282051282052},
 {'model': 'RF with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7424242424242424},
 {'model': '3 Nearest Neighbors with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.550561797752809},
 {'model': 'Decision Tree with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.676923076923077},
 {'model': '10 Random Forest 5-max-depth with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5714285714285714},
 {'model': 'Neural Net with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.6593406593406593},
 {'model': 'AdaBoost with (2, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7397260273972602},
 {'model': 'LR with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7575757575757576},
 {'model': 'RF with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7058823529411765},
 {'model': '3 Nearest Neighbors with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5647058823529412},
 {'model': 'Decision Tree with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6212121212121212},
 {'model': '10 Random Forest 5-max-depth with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.4186046511627907},
 {'model': 'Neural Net with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.691358024691358},
 {'model': 'AdaBoost with (2, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7142857142857143},
 {'model': 'LR with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7236842105263158},
 {'model': 'RF with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.6615384615384615},
 {'model': '3 Nearest Neighbors with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5483870967741935},
 {'model': 'Decision Tree with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5555555555555556},
 {'model': '10 Random Forest 5-max-depth with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5686274509803921},
 {'model': 'Neural Net with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.6551724137931034},
 {'model': 'AdaBoost with (3, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7088607594936709},
 {'model': 'LR with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.746031746031746},
 {'model': 'RF with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7611940298507462},
 {'model': '3 Nearest Neighbors with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.5875},
 {'model': 'Decision Tree with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7068965517241379},
 {'model': '10 Random Forest 5-max-depth with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8076923076923077},
 {'model': 'Neural Net with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6883116883116883},
 {'model': 'AdaBoost with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6835443037974683},
 {'model': 'LR with (4, 4)-Grams-CountVectorizer-vectorizer', 'score': 0.72},
 {'model': 'RF with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7076923076923077},
 {'model': '3 Nearest Neighbors with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5454545454545454},
 {'model': 'Decision Tree with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.7272727272727273},
 {'model': '10 Random Forest 5-max-depth with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.5},
 {'model': 'Neural Net with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.6455696202531646},
 {'model': 'AdaBoost with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.803030303030303},
 {'model': 'LR with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7101449275362319},
 {'model': 'RF with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7611940298507462},
 {'model': '3 Nearest Neighbors with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6075949367088608},
 {'model': 'Decision Tree with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7407407407407407},
 {'model': '10 Random Forest 5-max-depth with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.9090909090909091},
 {'model': 'Neural Net with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6292134831460674},
 {'model': 'AdaBoost with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.6025641025641025}]
In [17]:
top_ten = sorted(scoreboard_dict_list, key=lambda d: d['score'], reverse=True)[0:10]
In [18]:
print("Top 10 models: (by precision metric)")



top_ten
Top 10 models: (by precision metric)
Out[18]:
[{'model': '10 Random Forest 5-max-depth with (4, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.9090909090909091},
 {'model': 'RF with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.859375},
 {'model': '10 Random Forest 5-max-depth with (1, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8333333333333334},
 {'model': 'RF with (2, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.819672131147541},
 {'model': '10 Random Forest 5-max-depth with (1, 1)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8194444444444444},
 {'model': '10 Random Forest 5-max-depth with (3, 4)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.8076923076923077},
 {'model': 'AdaBoost with (4, 4)-Grams-CountVectorizer-vectorizer',
  'score': 0.803030303030303},
 {'model': 'AdaBoost with (1, 1)-Grams-CountVectorizer-vectorizer',
  'score': 0.7733333333333333},
 {'model': 'Decision Tree with (2, 2)-Grams-TfidfVectorizer-vectorizer',
  'score': 0.7681159420289855},
 {'model': '10 Random Forest 5-max-depth with (1, 2)-Grams-CountVectorizer-vectorizer',
  'score': 0.7611940298507462}]

the end (: